Calculating minimum k-unsafe and maximum k-safe sets of variables for disclosure risk assessment of individual records in a microdata set
نویسنده
چکیده
In the framework of disclosure control of a microdata set, an unique record is at risk of being identified. Even if a record is not unique in the microdata set, it may be considered risky if the frequency k of the cell, in which the record falls, is small. The notion of minimum unsafe combination introduced by Willenborg and de Waal (1996) is important in this respect. The purpose of this paper is to clearly define closely related notions and give an algorithm for obtaining relevant combinations of variables. We will define minimum k-unsafe and maximum k-safe sets of variables for each record and give an illustration to show the usefulness of the proposed technique.
منابع مشابه
Disclosure risk assessment in statistical microdata protection via advanced record linkage
The performance of Statistical Disclosure Control (SDC) methods for microdata (also called masking methods) is measured in terms of the utility and the disclosure risk associated to the protected microdata set. Empirical disclosure risk assessment based on record linkage stands out as a realistic and practical disclosure risk assessment methodology which is applicable to every conceivable maski...
متن کاملA polynomial-time approximation to optimal multivariate microaggregation
Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on the underlying individuals. Microaggregation techniques are currently being used by many statistical agencies. The principle of microaggregation is to group o...
متن کاملA Comparative Study of Microaggregation Methods
Microaggregation is a statistical disclosure control technique for microdata. Raw microdata (i. e. individual records) are grouped into small aggregates prior to publication. Each aggregate should contain at least k records to prevent disclosure of individual information. Fixedsize microaggregation consists of taking fixed-size microaggregates (size k). Data-oriented microaggregation (with vari...
متن کاملMeasuring Disclosure Risk for a Synthetic Data Set Created Using Multiple Methods
Government agencies must simultaneously maintain confidentiality of individual records and disseminate useful microdata. We propose a method to create synthetic data that combines quantile regression, hot deck imputation, and rank swapping. The result from implementation of the proposed procedure is a releasable data set containing original values for a few key variables, synthetic quantile reg...
متن کاملA Comparative Study of MicroaggregationMethodsJosep M . Mateo - Sanz and Josep Domingo
Microaggregation is a statistical disclosure control technique for mi-crodata. Raw microdata (i. e. individual records) are grouped into small aggregates prior to publication. Each aggregate should contain at least k records to prevent disclosure of individual information. Fixed-size microaggregation consists of taking xed-size microaggregates (size k). Data-oriented microaggregation (with vari...
متن کامل